Good Simulation Code I: Design

How to Structure Code: From Monolithic to Modular Design

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about writing clean Monte Carlo code


By the end, you should be able to

  • Start with a simple function-based simulation and recognize its place and limitations
  • Refactor your code into modular reusable classes
  • Understand how modularity leads to clearer, more extensible code

References

Statistics:

  • Chapter 4 of Hansen (2022) for OLS
  • Chapter 14 of Hansen (2022) for time series basics

Programming

  • Chapter 2-3 of Hillard (2020) on basics of design
  • Chapter 26-27 of Lutz (2025) on OOP in Python

Simulation Setting

Context: Bias of OLS Estimator

Consider the simple causal linear model:

\[ Y_{t}^x = \beta_0 + \beta_1 x + U_t, \quad t=1, \dots, T \] \(Y_t^x\) — potential outcome under \(x\) for observation \(t\)


Basic econometrics tells us that OLS estimator in regressing \(Y_t\) on \((1, X_t)\) is

  • Unbiased if \(\E[U_t|X_1, \dots, X_T] = 0\)
  • Possibly biased otherwise

Why Care About Bias?

Why should a user of OLS care about this bias?


If bias is big, can lead to

  • Improperly centered confidence intervals
  • Wrong sign of point estimate


\(\Rightarrow\) both can lead to incorrect policy conclusions

Today’s Questions


  • How big is the bias?
  • How to think about and structure simulations in general?


Today — talk mostly about code design

  • Our three-step simulation anatomy
  • A sketch for extensible simulation design — foundation for later

Setting Out Simulation Goals

Choosing Context

Why Not Theory?

Theory not fully satisfactory: expression for bias depends on the DGP for \((X_t, U_t)\): \[ \E[\hat{\bbeta}] - \bbeta= \E\left[ (\bW'\bW)^{-1}\bW'\bU \right] \] where \(\bW\)\(T\times 2\) with \(t\)th row given by \((1, X_t)\)

  • Bao and Ullah (2007): asymptotic order \(O(1/T)\)
  • Unclear if actual magnitude “important in practice”

Why Not Use Real Data?

Empirical data validation not an option both in causal and predictive settings:

  • Can never measure true bias even if believe in linear model
  • In predictive settings: don’t even care about \(\hat{\bbeta}\), only about predicted \(\hat{Y}_t\)


Have to recur to simulations

Simulation Requirements


Recall that simulations should be

  • Realistic: need to think about relevant real world scenario to emulate

  • Targeted:

    • Be specific in terms of target metrics
    • Focus DGPs that cause differences in target metrics

Choosing Scenario: Time Series

As an example, we care about time series


What’s relevant?

  • Possibility of dependence across time
  • Different strength of dependence
  • Different lengths of time series
  • (More advanced): data drift/time-varying DGP

We’ll include the first three

Metrics

For simplicity, we will consider just one explicit metric:

Our metric: absolute bias of OLS estimator of \(\beta_1\):

\[ \text{Bias}(\hat{\beta}_1) = \E[\hat{\beta}_1] - \beta_1 \]

Other metrics:

  • Coverage of CIs based on the OLS estimator
  • Proportion of incorrect signs (\(\mathrm{sgn}(\hat{\beta}_1)\neq \mathrm{sgn}(\beta_1)\))

Specifying DGPs

The Problem of DGP Choice

We now have to choose data generating processes that reflect

  • Dynamics of different strength
  • Different sample sizes

Ideally: would specify based on some reference empirical datasets

Here: talk in a very stylized manner

Static vs. Dynamic

First dimension: include both static and dynamic cases

Static:

\[ \small Y_{t} = \beta_0 + 0.5 X_{t} + U_{t} \]

Covariate \(X_t\) independent from \(U_t\) and over time

Dynamic:

\[ \small Y_{t} = \beta_0 + \beta_1 Y_{t-1} + U_{t} \]

Dependence: \(\beta_1\) value

\[ \small \beta_1 \in \curl{0, 0.5, 0.95} \]

Bias appears in dynamic setting, but not static. Checking dynamic vs. static is targeted

Sample Size and Distributions of \(U_t, X_t, Y_0\)

Second and third design dimension:

  1. Sample sizes
  2. DGPs for variables not determined by model

To start with — keep things simple to focus on essentials:

  • One DGP per \(U_t, X_t, Y_0\): each \(N(0, 1)\)
  • \(T=50, 200\)

Different sample sizes: how bias changes (targeted)

First Implementation

Basic Function

Summary So Far

So far specified:

  • Metric
  • How each dataset should be generate
  • Size of each dataset


Now time to actually implement — but how?

Plan for Today

Today we talk about two approaches:

  • Based on a single function (good for starting and small simulations)
  • A more modular approach (better for larger simulations and easier to develop)


Code examples of today: illustrative examples

General Advice

How to not get overwhelmed — a good universal rule:

Start with the simplest pieces and iterate from there


  • Simulations are often fairly complex piece of code
  • Do not try to write the whole thing in one go
  • Do not be afraid of rewriting and improving things later

Practice: Starting with the Static Case

Our simplest thing: static model with a small sample \[ Y_{t} = \beta_0 + \beta_1 X_t + U_t, \quad T=50 \]

Remember simulation anatomy: for many datasets we

  1. Draw 50 points \((X_t, Y_t)\)
  2. Run OLS estimator \(\hat{\beta}_1\)
  3. Compute error \(\hat{\beta}_1-\beta_1\)

Simplest: wrap in a single function

Example: Basic Simulation Function

def simulate_ols(n_sim=1000, n_obs=50, beta0=0, beta1=0.5):
    """Simulates OLS estimation for a static/exogenous DGP: y = beta0 + beta1*x + u.

    Args:
        n_sim (int): Number of simulations to run. Defaults to 1000.
        n_obs (int): Number of observations per simulation. Defaults to 100.
        beta0 (float): True intercept value. Defaults to 0.
        beta1 (float): True slope coefficient. Defaults to 0.5.

    Returns:
        list[float]: List of errors of beta1 estimates for each simulation.
    """

    # Create container to store results
    results_ols_errors = []

    # Initialize the RNG
    rng = np.random.default_rng()

    # Core simulation loop
    for _ in range(n_sim):
        # 1. Draw data
        x = rng.normal(size=n_obs)
        u = rng.normal(size=n_obs)
        y = beta0 + beta1 * x + u

        # 2. Run OLS estimation
        W = np.column_stack([np.ones(n_obs), x])
        beta_hat = np.linalg.inv(W.T @ W) @ W.T @ y

        # 3. Compute error
        results_ols_errors.append(beta_hat[1]-beta1)

    return results_ols_errors

Our First Results: Static Model

Now can execute our simulation by calling function with default arguments:

static_results = simulate_ols()
print(f"Bias (static DGP): {np.mean(static_results)}")
Bias (static DGP): -0.008044980668442569

Qualitatively:

  • Bias small relative to coefficients (recall \(\beta_1=0.5\))
  • Theory is confirmed

Distribution of MC Estimates

Can also take a look at density of estimates:

Setting Seeds

Problem: Lack of Reproducibility

What happens if we rerun our code twice?

static_results_1 = simulate_ols()
static_results_2 = simulate_ols()
print(f"Bias (first run): {np.mean(static_results_1)}")
print(f"Bias (second run): {np.mean(static_results_2)}")
Bias (first run): -0.0015702105365570371
Bias (second run): 0.00020604661988740292
  • Same qualitative result
  • Different numerical results

Our results are not reproducible

Setting the RNG Seed

Solution: provide a seed to our RNG

def simulate_ols(n_sim=1000, n_obs=50, beta0=0, beta1=0.5, seed=1):
    """Simulates OLS estimation for a static/exogenous DGP: y = beta0 + beta1*x + u.

    Args:
        n_sim (int): Number of simulations to run. Defaults to 1000.
        n_obs (int): Number of observations per simulation. Defaults to 100.
        beta0 (float): True intercept value. Defaults to 0.
        beta1 (float): True slope coefficient. Defaults to 0.5.
        seed (int): random number generator seed. Defaults to 1.

    Returns:
        list[float]: List of errors of beta1 estimates for each simulation.
    """

    # Create container to store results
    results_ols_errors = []

    # Initialize the RNG
    rng = np.random.default_rng(seed)

    # Core simulation loop
    for _ in range(n_sim):
        # 1. Draw data
        x = rng.normal(size=n_obs)
        u = rng.normal(size=n_obs)
        y = beta0 + beta1 * x + u

        # 2. Run OLS estimation
        W = np.column_stack([np.ones(n_obs), x])
        beta_hat = np.linalg.inv(W.T @ W) @ W.T @ y

        # 3. Compute error
        results_ols_errors.append(beta_hat[1]-beta1)

    return results_ols_errors

Rerunning Simulations

Now rerunning with the same (default) seed gives the same results:

static_results_1 = simulate_ols()
static_results_2 = simulate_ols()
print(f"Bias (first run): {np.mean(static_results_1)}")
print(f"Bias (second run): {np.mean(static_results_2)}")
Bias (first run): -0.00541847302776387
Bias (second run): -0.00541847302776387


Always set seeds explicitly to help reproducibility

Expanding the Simulation

Expanding the Simulation

Now need to expand simulations to add AR(1) design

Simplest way: just expand existing function

  • Add new DGP option "ar1"
  • Involves changing simulate_ols to have a new argument for dgp_type
    • Old beahvior: dgp_type="static"
    • New behavior: dgp_type="ar1"
  • Also deal with losing one observation to \(Y_{t-1}\)

Expanded Simulation Function: With AR(1) DGP

def simulate_ols(n_sim=1000, n_obs=50, beta0=0, beta1=0.5, seed=1, dgp_type="static"):
    """Simulates OLS estimation for static or AR(1) DGP.
    Args:
        n_sim (int): Number of simulations to run. Defaults to 1000.
        n_obs (int): Number of observations per simulation. Defaults to 100.
        beta0 (float): True intercept value. Defaults to 0.
        beta1 (float): True slope coefficient. Defaults to 0.5
        seed (int): random number generator seed. Defaults to 1.
        dgp_type (str): Type of DGP: "static" or "ar1". Defaults to "static".

    Returns:
        list[float]: List of errors of beta1 estimates for each simulation.
    """

    # Create container to store results
    results_ols_errors = []

    # InitializeRNG
    rng = np.random.default_rng()

    # Core simulation loop
    for _ in range(n_sim):
        # 1. Draw data
        if dgp_type == "static":
            x = rng.normal(size=n_obs)
            u = rng.normal(size=n_obs)
            y = beta0 + beta1 * x + u
        elif dgp_type == "ar1":
            y = np.zeros(n_obs + 1)
            u = rng.normal(size=n_obs + 1)
            y[0] = beta0 + u[0]
            for t in range(1, n_obs + 1):
                y[t] = beta0 + beta1 * y[t-1] + u[t]
            # Use the last n_obs elements of y as and the first n_obs as x
            x = y[:-1]  # x is y lagged (n_obs elements)
            y = y[1:]  # y[1:] is the dependent variable (n_obs elements)
            

        # 2. Run OLS estimation
        W = np.column_stack([np.ones(n_obs), x])
        beta_hat = np.linalg.inv(W.T @ W) @ W.T @ y
        
        # 3. Compute error
        results_ols_errors.append(beta_hat[1]-beta1)   

    return results_ols_errors

Running AR(1) Simulation

Can now run our simulations with \(\beta \in \curl{0, 0.5, 0.95}\):

ar_results_no_pers = simulate_ols(beta1=0, dgp_type="ar1")
ar_results_med_pers = simulate_ols(beta1=0.5, dgp_type="ar1")
ar_results_high_pers = simulate_ols(beta1=0.95, dgp_type="ar1")
print(f"Bias (no persistence): {np.mean(ar_results_no_pers):.3f}")
print(f"Bias (medium persistence): {np.mean(ar_results_med_pers):.3f}")
print(f"Bias (high persistence): {np.mean(ar_results_high_pers):.3f}")
Bias (no persistence): -0.018
Bias (medium persistence): -0.045
Bias (high persistence): -0.096

Conclusions:

  • More persistence = larger absolute bias
  • Direction: downward (underestimating)

MC Distribution Under Persistence

Effect of Sample Size

To check effect of sample size, modify n_obs:

ar_results_no_pers = simulate_ols(beta1=0, n_obs=200, dgp_type="ar1")
ar_results_med_pers = simulate_ols(beta1=0.5, n_obs=200, dgp_type="ar1")
ar_results_high_pers = simulate_ols(beta1=0.95, n_obs=200, dgp_type="ar1")
print(f"Bias (no persistence): {np.mean(ar_results_no_pers):.3f}")
print(f"Bias (medium persistence): {np.mean(ar_results_med_pers):.3f}")
print(f"Bias (high persistence): {np.mean(ar_results_high_pers):.3f}")
Bias (no persistence): -0.005
Bias (medium persistence): -0.013
Bias (high persistence): -0.023

Conclusions:

Bias decays as sample size increases

A More Robust Implementation

Problem: How To Expand Simulations?

By now: a working base simulation


But what if we expand our simulations?

  • Try more DGPs (e.g. heavier tails for innovations, different \(X\))
  • More sample sizes
  • Different models (not necessarily \(Y_t = \beta_0 + \beta_1 X_t + U_t\))

Limitations of Current Approach

We need a better approach to organizing simulations

  • Basic approach is fine if you just want run one thing
  • But difficult if you want to expand more:
    • Adding new DGPs would mean expanding simulation function even more
    • We would have to remember to run all the different cases ourselves

Becomes difficult to scale and maintain

This Lecture: Dealing With Complexity

For the rest of today: dealing with design

  • Untangling the different components of simulation
  • Learning a more modular structure


Next time: dealing with the manual orchestration issue

Towards Modular Design

A Step Back: What Are We Doing?

There are three code tasks:

  1. Data generation: draw data from a DGP
  2. Estimation: apply an estimator to the data
  3. Orchestration: run the above many times and collect results


For now things are all jumbled up and hardcoded by simulate_ols()monolithic design

Why Should The Simulation Loop Care?

The simulation loop shouldn’t need to know:

  • How data is generated (e.g., AR(1) vs. static)
  • How estimation works (e.g., OLS vs. IV)

It only needs to:

  • Get data from something (DGP)
  • Pass data to something (estimator)
  • Repeat and store results

Background: Separation of Concerns

Key in clear code: divide various behavious into small, manageable pieces

Best way to divide — by concern. This:

  • Reduces complexity (each part does one thing well)
  • Improves reusability (swap DGPs/estimators without rewriting everything)
  • Makes collaboration easier (different people can work on different parts)

Interfaces: Glue Between Components

The simulation runner only needs to know about interfaces:

  • DGP interface: Must provide a method like sample(n_obs, seed)
  • Estimator interface: Must provide a method like fit(X, y)


As long as a DGP/estimator implement given interface, simulation runner can use it

Their implementation details do not matter

Background: Interfaces

Interfaces are like contracts

  • DGP promises to give valid date if sample() is called
  • Estimator promises to give suitable coefficients if fit() is called


Benefit: You can add new DGPs/estimators without touching the simulation runner!

DGPs Encapsulate Their Logic

Each DGP encapsulates its own logic:

  • AR(1) DGP handles its own loops, initial conditions, etc.
  • Static DGP just draws IID data.


The outside world only sees the sample() method that returns a sample of X and y

Implementing The New Design

Example: Static DGP Class

class StaticNormalDGP:
    """A data-generating process (DGP) for a static linear model

    Attributes:
        beta0 (float): Intercept term.
        beta1 (float): Slope coefficient.
    """

    def __init__(self, beta0: float = 0.0, beta1: float = 0.5):
        """Initializes the DGP with intercept and slope.

        Args:
            beta0 (float): Intercept term. Defaults to 0.0.
            beta1 (float): Slope coefficient. Defaults to 1.0.
        """
        self.beta0 = beta0
        self.beta1 = beta1

    def sample(
        self, n_obs: int, seed: int | None = None
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Samples data from the static DGP.

        Args:
            n_obs (int): Number of observations to sample.
            seed (int, optional): Random seed for reproducibility. Defaults to None.

        Returns:
            tuple: (x, y) arrays, each of length n_obs.
        """
        rng = np.random.default_rng(seed)
        x = rng.normal(size=n_obs)
        u = rng.normal(size=n_obs)
        y = self.beta0 + self.beta1 * x + u
        return x, y

Discussion of DGP Class

We use a class to:

  • Capture a specific concern (data drawing)
  • Specify interfaces (a sample method)
  • Encapsulate logic (could add more data logic)

Example usage

static_sampler = StaticNormalDGP(beta1=0.4)
x, y = static_sampler.sample(n_obs=5, seed=1)
print(f"x: {x.round(2)} \ny: {y.round(2)}")
x: [ 0.35  0.82  0.33 -1.3   0.91] 
y: [ 0.58 -0.21  0.71 -0.16  0.66]

Example: AR(1) Class

A class for AR(1) with same sample() interface

class DynamicNormalDGP:
    """A data-generating process (DGP) for a dynamic linear model: y_t = beta0 + beta1*y_{t-1} + u_t.

    Attributes:
        beta0 (float): Intercept term.
        beta1 (float): AR(1) coefficient.
    """

    def __init__(self, beta0: float = 0.0, beta1: float = 0.5):
        """Initializes the DGP with intercept and AR(1) coefficient.

        Args:
            beta0 (float): Intercept term. Defaults to 0.0.
            beta1 (float): AR(1) coefficient. Defaults to 0.5.
        """
        self.beta0 = beta0
        self.beta1 = beta1

    def sample(
        self, n_obs: int, seed: int | None = None
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Samples data from the dynamic DGP.

        Args:
            n_obs (int): Number of observations to sample.
            seed (int, optional): Random seed for reproducibility. Defaults to None.

        Returns:
            tuple: (x, y) arrays, each of length n_obs.
                  x is y_{t-1} (lagged y), and y is y_t.
        """
        rng = np.random.default_rng(seed)
        y = np.zeros(n_obs + 1)  # Extra observation for lag
        u = rng.normal(size=n_obs + 1)
        y[0] = self.beta0 + u[0]  # Initial condition
        for t in range(1, n_obs + 1):
            y[t] = self.beta0 + self.beta1 * y[t - 1] + u[t]
        # Return lagged y as x and y[1:] as y
        return y[:-1], y[1:]

Same Idea: Simple Estimator Class

Hide estimator logic and only provide fit()

class SimpleOLS:
    """A simple OLS estimator for the linear model y = beta0 + beta1*x + u.

    Attributes:
        beta0_hat (float | None): Estimated intercept. None until fit is called.
        beta1_hat (float | None): Estimated slope. None until fit is called.
    """

    def __init__(self) -> None:
        """Initializes the OLS estimator with no estimates."""
        self.beta0_hat: float | None = None
        self.beta1_hat: float | None = None

    def fit(self, x: np.ndarray, y: np.ndarray) -> None:
        """Fits the OLS model to the provided data.

        Args:
            x (np.ndarray): Independent variable (1D array).
            y (np.ndarray): Dependent variable (1D array).
        """
 
        # Add constant to x
        X = np.column_stack([np.ones(len(x)), x])
        # OLS estimation
        beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y
        self.beta0_hat, self.beta1_hat = beta_hat[0], beta_hat[1]

SimpleOLS In Action

To fit an instance of SimpleOLS, just pass data:

ols = SimpleOLS()
x, y = static_sampler.sample(n_obs=5000, seed=1)
ols.fit(x, y)
print(f"Estimated slope coefficient: {ols.beta1_hat:.3f}")
Estimated slope coefficient: 0.366


Now ready to put things together into simulation

Simulation Runner Class

class SimulationRunner:
    """Runs Monte Carlo simulations for a given DGP and estimator.

    Attributes:
        dgp: Data-generating process with a `sample` method.
        estimator: Estimator with a `fit` method and `beta1_hat` attribute.
        errors: array of estimation errors (beta1_hat - beta1) for each simulation.
    """

    def __init__(
        self,
        dgp: "StaticNormalDGP" | "DynamicNormalDGP",
        estimator: SimpleOLS,
    ) -> None:
        """Initializes the simulation runner.

        Args:
            dgp: An instance of a DGP class (must implement `sample`).
            estimator: An instance of an estimator class (must implement `fit`).
        """
        self.dgp = dgp
        self.estimator = estimator
        self.errors = None

    def simulate(self, n_sim: int, n_obs: int, seed: int | None = None) -> None:
        """Runs simulations and stores estimation errors.

        Args:
            n_sim (int): number of simulations to run.
            n_obs (int): Number of observations per simulation.
            seed (int | None): Random seed for reproducibility. Defaults to None.
        """
        # Preallocate array to hold estimation errors
        self.errors = np.empty(n_sim)

        # Run simulation
        for sim_id in range(n_sim):
            # Draw data
            x, y = self.dgp.sample(n_obs, seed=seed + sim_id if seed else None)
            # Fit model
            self.estimator.fit(x, y)
            # Store error
            self.errors[sim_id] = self.estimator.beta1_hat - self.dgp.beta1

Simulation Runner in Action

Simulation flow now:

  • Create DGP and estimator
  • Pass to SimulationRunner and simulate()
# Initialize DGP and estimator
static_dgp = StaticNormalDGP(beta0=0.0, beta1=0.5)
ols_estimator = SimpleOLS()

# Initialize and run simulation
ols_static_sim = SimulationRunner(static_dgp, ols_estimator)
ols_static_sim.simulate(n_sim=1000, n_obs=50, seed=1)

# Summarize bias
print(f"Average estimation error (bias): {ols_static_sim.errors.mean():.4f}")
Average estimation error (bias): -0.0027

Discussion: What Just Happened?

DGP, estimator, and simulation runner are now independent


Benefits:

  • Can add new scenarios without touching the core simulation logic
  • Different people can work on these without conflicts
  • Can reuse components elsewhere
  • Easier to read and change the code

What To Do With Results

SimulationRunner stores raw simulation results (here: full estimation errors)


How these are handled — depends on what you want to do

  • Can add summary methods
  • Can create other objects that process run simulations (and generate figures)

Example: Adding A Summary to SimulationRunner

class SimulationRunner:
    """Runs Monte Carlo simulations for a given DGP and estimator.

    Attributes:
        dgp: Data-generating process with a `sample` method.
        estimator: Estimator with a `fit` method and `beta1_hat` attribute.
        errors: array of estimation errors (beta1_hat - beta1) for each simulation.
    """

    def __init__(
        self,
        dgp: "StaticNormalDGP" | "DynamicNormalDGP",
        estimator: SimpleOLS,
    ) -> None:
        """Initializes the simulation runner.

        Args:
            dgp: An instance of a DGP class (must implement `sample`).
            estimator: An instance of an estimator class (must implement `fit`).
        """
        self.dgp = dgp
        self.estimator = estimator
        self.errors = None

    def simulate(self, n_sim: int, n_obs: int, seed: int | None = None) -> None:
        """Runs simulations and stores estimation errors.

        Args:
            n_sim (int): number of simulations to run.
            n_obs (int): Number of observations per simulation.
            seed (int | None): Random seed for reproducibility. Defaults to None.
        """
        # Preallocate array to hold estimation errors
        self.errors = np.empty(n_sim)

        # Run simulation
        for sim_id in range(n_sim):
            # Draw data
            x, y = self.dgp.sample(n_obs, seed=seed + sim_id if seed else None)
            # Fit model
            self.estimator.fit(x, y)
            # Store error
            self.errors[sim_id] = self.estimator.beta1_hat - self.dgp.beta1

    def summarize_bias(self) -> float:
        """Computes the average estimation error (bias) for beta1.

        Returns:
            float: Average error (beta1_hat - beta1) across simulations.
        """ 
        return np.mean(self.errors)

Example: Summarizing AR(1) Simulation

For example: compute bias in dynamic model with a lot of persistence:

# Initialize DGP and estimator
dynamic_dgp = DynamicNormalDGP(beta0=0.0, beta1=0.95)
ols_estimator = SimpleOLS()

# Initialize and run simulation
ols_dynamic_sim = SimulationRunner(dynamic_dgp, ols_estimator)
ols_dynamic_sim.simulate(n_sim=1000, n_obs=50, seed=1)

# Summarize bias
print(f"Average estimation error (bias): {ols_dynamic_sim.errors.mean():.4f}")
Average estimation error (bias): -0.0992

Recap and Conclusions

Recap


In this lecture we

  • Saw our first simulation
  • Talked about a simple approach to structuring simulation code (function)
  • Discussed the benefits of a more modular structure

How To Intepret Things From Today: Two Final Points

Choose the appropriate level of complexity for your situation:

  • A brief simulation with one DGP does not need deep architecture
  • A complex simulation involving many scenarios and estimators would benefit from a clearer structure

The specific implementations of today are not absolute, but an example of how to think

Further Improvements


Our simulation is in a good prototype shape, but can still improve some things:

  • Clean up relationships between DGP classes and organize them in a common family
  • Add simulation metadata (DGP name, estimator name for further purposes)
  • Tracking progress

Next Questions


  • How to clean up some loose ends in existing code?
  • How to organize running many simulations?
  • How to handle simulation results?
  • How to apply these logic to deeper statistical scenarios?

References

Bao, Yong, and Aman Ullah. 2007. “The Second-Order Bias and Mean Squared Error of Estimators in Time-series Models.” Journal of Econometrics 140 (2): 650–69. https://doi.org/10.1016/j.jeconom.2006.07.007.
Hansen, Bruce. 2022. Econometrics. Princeton_University_Press.
Hillard, Dane. 2020. Practices of the Python Pro. Shelter Island, NY: Manning Publications Co.
Lutz, Mark. 2025. Learning Python: Powerful Object-Oriented Programming. Sixth edition. Santa Rosa, CA: O’Reilly.